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Abstract 



Transferring information from observations of a dynamical system to estimate the 
fixed parameters and unobserved states of a system model can be formulated as the 
evaluation of a discrete time path integral in model state space. The observations 
serve as a guiding potential working with the dynamical rules of the model to direct 
system orbits in state space. The path integral representation permits direct numerical 
evaluation of the conditional mean path through the state space as well as conditional 
moments about this mean. Using a Monte Carlo method for selecting paths through 
state space we show how these moments can be evaluated and demonstrate in an inter- 
esting model system the explicit influence of the role of transfer of information from the 
observations. We address the question of how many observations are required to esti- 
mate the unobserved state variables, and we examine the assumptions of Gaussianity 
of the underlying conditional probability. 

1 Introduction 

Using observations to estimate unknown parameters and unobserved state variables in a 
model of a dynamical system when the measurements are noisy, when the model has errors, 
and when the initial state of the system is uncertain is a challenge across many areas of 
scientific inquiry. The goals in the estimation procedure might focus only on the knowledge 
of fixed parameters to characterize a model and allow, with a new initial state, the prediction 
of a response to new forcing or stimuli, or one may wish to know both the parameters and 
the full state of the model system at the end of data acquisition to permit prediction from 
that point forward in time. 

If the model system, and presumably the physical system it represents, expresses chaotic 
orbits, then repeated acquisition of new data to inform it of the correct location in its state 
space is likely required. Absent observations in occasional temporal windows, the positive 
Lyapunov exponents of the model system will cause errors in the state variables at the end of 
the assimilation window to grow exponentially rapidly, leading to the loss of all predictability 
of a specific orbit. Forecasts in geophysical settings often fall into this category. 

We have not made a thorough survey of the arenas where state and parameter estimation 
problems are important, but we have examined specific papers estimating parameters and 
states in neurobiology [HE], systems biology [3], atmospheric and oceanic sciences [HE], toxi- 
cology [6] biomedical engineering [7], cell biology [HI El, chemical engineering [10J, coastal and 
estuarine modeling [11], wastewater treatment [T2j , biochemistry [13], and immunology [H] 
as examples. Constructing observers in control theory [15] also deals with state estimation 
from observed data. 

This problem requires the evaluation of a discrete time path integral [16] through the state 
space of the model system where measurements act as a 'potential' transferring information 
to the model and guiding it toward the measured state and away from phase space locations 
where the model chaotic behavior might carry it. 



2 



2 Path Integral Formulation 



We start with observations of L quantities y(n) = {yi(t n ),y 2 (t n ), ...,y L (t n )} made in an 
observation window at the discrete times t n = {to, t%, t m }. These are known functions 
h[(w(t)); I = 1, 2, L of the state w(t) of the observed system. We construct a model of this 
system in D-dimensional space x(t n ) = x(n) = {xi(n),X2(n), XL(n),xi + i(n), ...,xd(ti)} 
and associate the first L components of x(t n ) = {xi(n), xl(ti)} with the measurements 
yi(n);l = 1,2, ...,L [16J. The other D — L state variables are unobserved. The model has 
time-independent parameters p, including those inherited from the measurement functions 
hi. The model dynamics is constructed on the basis of physical or biophysical reasoning not 
discussed here, and it is Markov: the state x(n + 1) is given in terms of the state x(n) and 
the p via the dynamical rule g a (x(n), x(n + 1), p) = 0; a = 1, 2, D; n — 0, 1, 2, m — 1. 

In the situation where measurements are noisy, the model has errors, and the initial state 
of the system is uncertain, we seek to estimate the conditional probability P(x(m) | Y(m)) [THJ 
[19] that the model state at time t m is at x(m), given the observations Y(m) = {y(m), y(m — 
1), ...,y(0)}. Using Bayes' rule [17J and the Chapman- Kolmogorov equation, both identities, 
we have the exact recursion relation 

P(x(m)|Y(m)) = exp[M/(x(m),y(m)|Y(m - 1))] 

j d D x{m - l)P(x(m))|x(m - l))P(x(m - l)|Y(m - 1)), (1) 

where the conditional mutual information between the model state x(m) and the observations 
y(m), given the previous observations Y(m — 1), is [2U] 

P(x(m), y(m)|Y(m — 1)) 



M/(x(m), y(m)|Y(m — 1)) = log 



(2) 



P(x(m)|Y(m - 1)) P(y(m)|Y(m - 1)). 

and P(x(n + l))|x(n)) is the transition probability to go from x(n) at t n to x(n+ 1) at t n+ \. 

Iterating back from t m to the beginning of the observation window at to gives us P(x(m) | Y(m)) 
in terms of a discrete time path integral along paths X = {x(m), x(m — 1), x(0)} through 
m + 1 time points in D-dimensional space: 

„ tn — 1 

P(x(m)|Y(m)) = / J] d D x(n)exp[-A (X,Y(m))}, (3) 

^ n=0 

where the action y4o(X, Y(m)) is given as 

m m—l 

A (X,Y(m)) = -£M/(x(n),y(n)|Y(n-l))-X; lo g P(x(n+l)|x(n))-logP(x(0)). (4) 

n=0 n=0 

The first term is the total conditional mutual information transferred from the observations 
to the model. The final term reflects uncertainty in the initial state at the beginning of the 
assimilation window. 

The conditional expectation value of a function on the path X is given by 

/ IC = o d D x(n) X (X) exp[-A (X, Y(m))] 



P[ X (X)|Y(m)]=<x(X)> 



/ Un=o d D x(n) exp[-A (X, Y(m))] 

/dX X (X)exp[-A (X,Y(m))] 
fdXexp[-A (X,Y{m))} 



(5) 
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The contributions of fluctuations along the path X are accounted for through the integral. 
We call this the dynamical estimation path integral (DEPI). There are no approximations 
made in the formulation of this discrete time path integral. 

In our considerations we are interested in the first through fourth moments of the com- 
ponents of X, namely, the [x a (n)] p ; a = 1, 2, D; n = 0, 1, m; p — 1, 2, 3, 4. This allows 
us to estimate the conditional mean trajectory, the RMS variation about it, as well as the 
skewness and kurtosis of the conditional distribution along the paths. This permits us to 
address the familiar assumption about the Gaussian nature of the conditional distribution 
where the skewness 



< (x a (ri)- < x a (ri) >) 3 > 
< (x a (n)- < x a (n) >) 2 > 3 / 2 ' 

and kurtosis 



(6) 



< (x a (n)- < x a (n) >) 4 > _ 3 

< (x a (n)- < x a {n) >) 2 > 2 

both vanish. 

In the limit where time is continuous, this problem has been formulated using approx- 
imations to the action A (X, Y(m)) J2TJ [22] [231 121], while in discrete time with similar 
approximations it has been considered as a noise reduction method [23]. We inherit one 
result from the analysis of continuous time: the temporal discretization of the noisy model 
equations 

^ = F(x(f),p)+irif) (8) 
should satisfy (x(n) = x(t„ = to + nAt)) for small At: 

x(n + l)-x(n) F(x(n + l),p) + F(x(n),p) , ,t n + t n+l 



At 



+ v(-^F^), (9) 



where r](t) is the noise term (or model error term), and At is the time step. This gives us 
the physically [21] correct form of the dynamics for small At: 

/ / \ / ..xv / / \ \ F(x(n+ l),p) + F(x(n),p) 
g a (x{n), x(n + 1), p) = x(n + 1) - x(n) - At^^ ^ 1 1 hy) (10) 

to use in the Markov transition probability P(x(n+ l)|x(n)). This result harks back at least 
to [26], [27] , and perhaps further. 

In an earlier paper [16J we discussed approximations to DEPI through the introduction 
of an effective action [28]. Loop expansions and renormalization group approximations to 
the integrals Eq. (jSJ) are standard in statistical physics. Here we wish to use many common 
assumptions about the structure of the action v4 (X, Y(m)) to see how the use of the path 
integral can reveal information about the orbits of a chaotic nonlinear system provided with 
occasional observations. The problem, fundamentally, whatever approximations one uses, is 
to evaluate a high dimensional integral of dimension (m + 1)D + K where we have K fixed 
parameters to estimate as well as D state variables at m+1 time locations. We use Monte 
Carlo methods for this purpose. 

The literature on Monte Carlo methods is vast and interesting [29j [30] [31] [32] [33] [34] [35] 
We have explored many different formulations of that technique, but here we report on 
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the use of the standard Metropolis-Hastings approach. A series of paths X*, i = 1,2, ...,N 
are generated according to the distribution exp[— A (X, Y(m))], and are then used to ap- 
proximate Eq. §5§ as 

1 N 

< x(x) >« - J>(X') (11) 

i=l 



3 Approximations to Ao(X, Y(m)) 

To proceed we must make some approximations to the terms in Aq(X.). (We no longer display 
the measurements Y(m).) We usually select the familiar assumption that the measurement 
noise is Gaussian at each time t n , though later we examine another noise distribution as 
well, and we assume that measurement errors are uncorrelated at different times. As noted 
in [37] there may be important physical settings where this assumption is incorrect. This 
situation would demand another approximation to the total conditional mutual information 
term in the action. 

For independent, Gaussian observation errors at each observation time we write for the 
first term in Aq(X.) 

m i m L 

-£M/(x(n),y(n)|Y(n-l)) = - £ £ ( yi (n) - a* (")) ( n ) - x v (n)), (12) 

n=0 1 n=0l,l'=l 

and, if the errors of an observation are independent of errors in other observations, then the 
L x L matrix R m is diagonal. We go even further for illustration purposes and consider the 
LxL matrix R m to be a multiple of the identity; this means the errors in various components 
of the observation vector are all of the same magnitude. Our approximation to the total 
conditional mutual information term in Aq(X.) takes the form 

m T3 m L 

-£M/(x(n),y(n)|Y(n-l)) = ^ £ £(yi(n) - ^(n)) 2 , (13) 

n=0 / n=0 1=1 

with R m a scalar. 

In the case of no model error and perfect model resolution, one would write 

P(x(n + l)|x(n)) = ^(g(x(«), x(n + 1), p)). (14) 

If we have resolution <jj in the model, this broadens the delta function to a smoother distri- 
bution, one representation of which is 



^(g(x(n),x(n+l),p)) 

This reduces to the delta function as a$ — > 0. This is certainly a minimalist representation 
of error in deterministic models, and it can hardly account properly for terms in the model 
being absent, for example, but it may give a sense of the effect of environmental noise as 
it impairs model output resolution. While there are more thoughtful discussions of model 



(2tt 



g(x(n),x(n + l),p) 
2aj 



(15) 
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error [38j EH HQ] , this simple assumption on model resolution will allow us to interpret the 
role of model error in assimilating information from data. 

The final term in the action is related to the initial distribution of the state variable at 
the start of data assimilation P(y(0)). We report here on calculations made assuming the 
initial conditions are distributed uniformly. This distribution has essentially no information 
about the orbits in state space, and one can, of course, do better with some knowledge of the 
state of the system at the beginning of data assimilation. In approaches such as the ensemble 
Kalman filter [ITj W2[ H3"] this information is placed in a mean or background state estimate 
at to and a covariance about this mean. Our experience showed that the initial distribution 
mattered very little as the dynamics takes the orbit from each initial condition and moves 
it onto or near the dynamical attractor because of the dissipation and the resulting strange 
attractor in the dynamics g a (x.{n + 1), x(n), p) = as suggested by the arguments of [38] . 

Our action then has two overall constants R m and Rf = \ as well as any parameters 

that enter the dynamical model g(x(n),x(n + l),p) = 0. By considering the parameters 
as state 'variables' satisfying p(n + 1) = p(n), we incorporate them into any estimation 
protocol we utilize. 



4 Evaluation of the Path Integral for the Lorenz96 
Model 

We have examined the DEPI method using the model of Lorenz [HJ SS] with D = 20 degrees 
of freedom for a range of R m and Rf values. The dynamical equations for this model are 

= w a _x(t)K +1 (t) - «v. 2 (t)) - w a (t) + /; a = 1, 2, D (16) 

at 

with W-i(t) = WD-x(t),Wo(t) = Wjj(t), and wo+iif) = wi(t), with D = 20, some choice of 
w a (0) and with the forcing parameter selected to be / = 8.17 creating chaotic orbits w(t). 
We generated a solution to the dynamical equations and added Gaussian white noise to the 
Wa{t n ) to act as our measurements: y(n) = w(n) + noise. 

This is a twin experiment in which the model equations are (Tl6l) for variables x a (t). 
We evaluated the first through fourth moments of the paths X so we could estimate the 
conditional mean < x a (n) >; a = 1,2, ...D;n = 0, 1,2, ...m as well as the RMS variation 
about this mean and the skewness and kurtosis associated with the distribution of x(n) at 
the temporal locations along the path. The latter tells us how the conditional probability 
differs from a Gaussian within and without any assimilation windows. 

The time step in solving these equations was At = 0.05 jHHl5] corresponding to 6 hours 
in physical time. We selected paths from a space of dimension (m + 1)D + K with m = 80 
time steps, D = 20 state variables and K — 1 parameters. We started from an arbitrary 
initial path and ignored the first 3 x 10 5 paths generated, and then used the next 1.2 x 10 6 
paths for the evaluation of the moments of X. The computation time per path generated 
will depend on the number of terms in the action that need to be re-computed when a single 
component of the path is updated. For the Lorenz96 model the number of terms that need 
to be computed to update the entire path is (8 + K){m + 1)D. 
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4.1 How many Observations are Required? 

It is necessary to decide how many observations are required to allow the estimation of the 
unobserved state variables. 

We know that as <7/ —>■ or Rf becomes very large, we reach the deterministic, no 
model-error, setting. In this limit the search over state and parameter values becomes a 
search in the space of initial conditions and parameters with dimension D + K, and is 
impeded by complex surfaces with many local minima in A (X) associated with instabilities 
on the synchronization manifold [l6l W7\ Xi(n) ~ Hi(n); I = 1,2,...,L. In our case where 
Rf is finite, we still see the remnants of this instability in the higher dimensional space of 
(m + 1)D + K dimensions, but it is much less of an issue. In Figure ([1]) we display the action 
A (X. fi na i) evaluated by solving the differential equation 

dX(s) dA {X{s)) 
ds ~ dX(s) 

for 100 random selections of X(s = 0) using the A Q (X) associated with the discrete time 
formulation of the Lorenz96, D = 20 model. This 'Langevin' equation (without noise here) 
has Ao(X) Lyapunov function, since 

dA Q (X(s)) _ /dX(a)\ 2 
ds \ ds J ' 

and converges to paths Xf ina i at locations in X space where 9A q^~^ — 0. Equation (|T7|) is of 

interest because, if one adds Gaussian noise of amplitude \/2 to the right hand side of this 
equation, the paths associated with that Langevin equation are distributed in X space as 
exp[-A (X)]. 

The distribution of values of the action at the locations Xf ina i are shown for L — 6, 8, 
and 10 observations, R m = 50, with Rf = 100 and with Rf = 500 in Figure (CQ). For the 
Monte Carlo approximation of the path integrals of Eq. ([5]) to be effective, the most probable 
paths should all be clustered around a well defined global minimum of Aq(X). Figure (0Q) 
suggests that there are two ways within the DEPI approach to make the surface of A (X) 
smoother. One way is to increase the number of observed variables. This causes the number 
of local minima of A (X) to be reduced, because information from measurements reduces 
the number of likely paths. The other way is to increase the uncertainty of the model, by 
decreasing Rf. 

In Figure ([I]) we investigate increasing the number of observed variables. The figure 
shows that L = 6 observations are not enough, but L = 8 are enough to smooth the surface. 
Even though there still some local minima for L > 8, the action around these minima is 
much larger than at the global minimum, so those paths contribute a negligible amount to 
the path integrals. 

In the deterministic case, we know that the minimum number of observations required to 
remove the instabilities is equal to the number of positive conditional Lyapunov exponents 
(CLEs) associated with the synchronization manifold [161117] for the model equations when 
they receive the obervations yi(t n ). We have examined this for the Lorenz96 model and 
found that about 0AD observations are needed to make all CLEs negative [15] . For this 
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Figure 1: Lorenz96, D = 20. A (Xf ina i) for 100 initial choices of X(s = 0) allowing the 
paths to evolve through Equation (fTTj) . with the number of observed variables L = 6,8, or 
10. Left: R f = 100. Right: R f = 500. 



reason we selected L = 8 for our analysis of the D = 20 dimensional Lorenz96 system. 
In our calculations we chose to 'observe' y a (n) for a = 0, 3, 5, 8, 10, 13, 15 and 18. It 
is not necessary to have observations at every time step, and in fact here we only provide 
observations for even n. The missing observation terms are excluded from the conditional 
mutual information contribution to the action. 

We emphasize that this criterion for selecting the required number of observations de- 
pends on the model, which is very useful. It also depends on the accuracy of the model, 
here represented by Rf. The more accurate the model, namely the larger Rf, the more local 
minima there will be in v4 (X), because of instabilities on the synchronization manifold of 
the deterministic problem [4"6l Wf\ . 

4.2 Results of Monte Carlo Estimation of the Path Integral for 
Moments of X 

We report here on example calculations using the Lorenz96 model with D = 20 and 8 
observed variables. We used a data assimilation window of 80 steps (0 < t < 4) with obser- 
vations every other step, followed by a prediction window (4 < t < 6) with no measurements. 
(At = 0.05 as above.) In most cases we considered, the moment calculations of x(n) in the 
prediction window were computed in the deterministic limit (Rf — > oo) by integrating the 
model equation forward in time using a 4th order Runge-Kutta procedure. The integration 
was done by taking initial conditions x(£ m ) and parameter values from each sampled path at 
the end of the assimilation window. This effectively evolves the whole conditional distribu- 
tion at the end of the assimilation window forward in time into the prediction window. We 
also did a simpler type of prediction for comparison, where the model equation is integrated 
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forward using only the conditional mean values < x(t m ) > and < p > at the end of the data 
assimilation window as initial conditions and parameters respectively. These two methods 
give quite comparable results, but the first provides more information about the distribution 
of states. 

4.2.1 Prediction by Model Equations for t > t m 

The results of two of these calculations are shown in Figures f T2l3l) for two representative state 
variables, Xo(t) (observed) and Xig(t) (unobserved). These behave in the same manner as 
the other eighteen variables, whether observed or not. We first created the 'true path' (solid 
black lines) by integrating the model equations with / = 8.17 and some choice of initial 
conditions on the attractor. This gives us all the state variables of the observed system. 
We then generated 'observations' (blue dots) from eight of the state variables by adding 
Gaussian noise with standard deviation a m = 0.353 to the true path. There is no correlation 
in the noise at different time steps or among different variables. 

We selected R m = 8 ~ and Rf = 100 for these calculations. The estimated states 

(for < t < 4) and the predicted states (for 4 < t < 6) are shown as green lines with red 
error bars representing the conditional mean plus or minus one standard deviation. Also 
shown in the Figures are the skewness and kurtosis of the state variables at each time step. 

The state estimates track the true path quite well in the assimilation and prediction 
windows for the observed as well as the unobserved state variables. The uncertainties of 
the predicted states grow in time, because the largest Lyapunov exponent of the model is 
positive, about 0.9 in units of inverse time [48J. This means that at the end of the prediction 
period, t — 6, the uncertainties should be about six times as large as the uncertainties at 
the end of the assimilation window, t = 4. 

In the example of Figures d2J) skewness and kurtosis are both close to zero in the assim- 
ilation window. They are slightly smaller in magnitude for Rf = 100 than for Rf = 500 
(not shown). This suggests that the conditional distributions are nearly Gaussian during 
the assimilation window, probably because of the influence of the measurements. The ratio 

is the determining factor. When this is sizeable, the Gaussian errors in the observations 
are important. When this ratio goes to zero, in the deterministic or zero model error limit, 
the non-Gaussian part of the action is dominant. When the assimilation window ends, the 
distribution is evolved according to the nonlinear dynamics of the model, and so it becomes 
much more non-Gaussian and less localized because of chaos. The distribution can become 
quite complicated possibly with the regions containing the most probable paths no longer 
contiguous in path space, and so extending the Monte Carlo evaluation into the prediction 
window is likely to be difficult. Nonetheless we show below an example that attempts this. 

This suggests that as the models of the observed process become better and better, 
namely, model error is reduced, the role of the nonlinear, non-Gaussian elements of the 
action Aq(X, Y(m)) will be more and more important. Approximations to the assimilation 
of information from measurements based on Gaussian assumptions may become less valuable 
in this circumstance. 
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4.2.2 Prediction within the Path Integral 

We also performed a Monte Carlo path integral calculation to estimate the moments both 
within the data assimilation window and within the prediction window, and show the results 
in Figure (jlj). In this we select paths to evaluate the path integral over the whole time 
interval < t < 6 including both a data assimilation and a window with no observations. 
The decreased resolution of the model dynamics represented in the path integral by finite 
Rf now plays a role in the quality of the predictions for t > 4. We see that the prediction 
is slightly worse than when we used the model equations for t > 4, while the deviation from 
Gaussianity in the data assimilation window is increased. 



4.2.3 Non-Gaussian Measurement Error 

In the use of the path integral method, or any other data assimilation approach actually, 
we do not know the statistics of the error in the measurements, and while the assumption 
that they are Gaussian is common, it is by no means necessary. To examine the implication 
of selecting another distribution of the errors, we represented the measurement errors by a 
Lorentzian distribution (also known as a Cauchy distribution) 

*M - jrh?- (19) 



This replaces the conditional mutual information term in the action by 

log(l+ i '" 

n=0 n=0 1=1 



m m L i t> \ 

£ M/(x(n),x(n)|Y(n - 1)) = 4 £ £ log 1 + ^( W (n) - *,(n)) 2 , (20) 



Figure (j5J) shows the results of this calculation, using the same measurement data as the 
other two examples, and with Rf = 100, R m = 8. This change does not make a substantial 
difference for the values of R m and Rf we used, except that the conditional distributions of 
the observed variables in the assimilation window are slightly less consistent with a Gaussian 
than before. The exploration of the effect of non-Gaussian measurement noise distributions 
can be accomplished in a straightforward manner through the use of DEPI. 



4.2.4 Annealing as a Monte Carlo Tool 

Figure (jTJ suggests that it might be useful to gradually decrease the model resolution during 
the initialization phase to avoid getting trapped in local minima and completely missing the 
global minimum. We do this in all the examples by replacing Rf by f3Rf in the action. 
We start the simulated annealing process with (3 = 0.01 on the first iteration and end with 
(3 = 1.00 on the 200,000 th iteration, by multiplying (3 by a constant factor on every iteration, 
and then do another 100,000 initialization iterations before recording any path statistics. 
We found that typically the paths near a local minima of A (X.) differ from the true path 
at early times, but coincide at later times, because the dynamics is dissipative. All paths 
contribute to the path integral, however these paths have significantly larger action than 
paths around the global minimum, and so have a negligible contribution because of the 
exponential weighting factor. 
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Figure 2: Observed variable Xo(t). Left: Conditional Mean (Green) and RMS-error (Red) 
along with Gaussian distributed noisy observations (Blue) and known Xo(t) (Black). Right: 
Skewness (Blue) and Kurtosis (Red) of x (t). These are Monte Carlo estimates from the 
path integral for the Lorenz96 model with D = 20; R m — 8, Rf — 100. The assimilation 
window is < t < 4, and the prediction window is 4 < t < 6. The parameter estimate is 
/ = 8.25 ± 0.09. Predictions for t > 4 are made with a fourth order Runge-Kutta procedure 
using information on the parameter and state variables at t = 4. 

5 Discussion 

We have presented the problem of incorporating information from noisy observations into 
model dynamics of the observed system as an exact discrete time path integral along orbits of 
the state variables of the model. The density of paths in the state space of the orbits is given 
by exp[— yl (X)] where A (X) is an action composed of terms conveying information from 
the measurements to the model, terms propagating the model between observations, and the 
distribution of states at the beginning of the temporal observation window. As expectation 
values of functions in the state space X, x(X), are integrals over paths with this density, 
one is presented with high dimensional integrals to perform, and we have investigated the 
use of Monte Carlo methods for this purpose, after making simplifying assumptions about 
the elements of the action. 

By selecting the function x(X) as the first through fourth powers of the state variables, 
we have evaluated the conditional mean state along the path as well as the RMS variation 
about the mean, and the skewness and kurtosis associated with fluctuations about that 
conditional mean. All moments are conditioned on the observations. As an example, much 
studied in the geophysical literature, we selected the model of Lorenz [H] with D = 20 state 
variables and one fixed, forcing parameter, working in a region of forcing where orbits of the 
model are chaotic. We performed a twin experiment wherein the data is generated by the 
model, noise is added to those orbits, and the model is used as a nonlinear filter through the 
path integral to estimate the value of the unobserved state variables as well as the forcing 
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Figure 3: Unobserved variable Xig(t). Left: Conditional Mean (Green) and RMS-error (Red) 
along with Gaussian distributed noisy observations (Blue) and known xi$(t) (Black). Right: 
Skewness (Blue) and Kurtosis (Red) of x\g(t). These are Monte Carlo estimates from the 
path integral for the Lorenz96 model with D = 20; R m = 8,Rf = 100. The assimilation 
window is < t < 4, and the prediction window is 4 < t < 6. The parameter estimate is 
/ = 8.25 ± 0.09. Predictions for t > 4 are made with a fourth order Runge-Kutta procedure 
using information on the parameter and state variables at t = 4. 

parameter. 

The number of observed states required to allow the estimation of the remaining states 
of the model system and any fixed parameters of the model is an important issue to be 
addressed, for with L observations and a D >> L dimensional model as is typical, one needs 
some sense of L in order to proceed. In the case where there is no model error and the 
dynamics has perfect resolution, we know the number of pieces of information required from 
the observations should be, at minimum, the number of conditional Lyapunov exponents of 
the nonlinear dynamics of the model. When one has model errors and reduced resolution in 
the model state space, we showed that one may use properties of the action to estimate how 
many observations are required. 

The tool we proposed is to look for minima in the action by solving the 'Langevin' 
equation for the paths as they evolve in "time" called s: 

<«(*) dA (X(s)) 
ds dX(s) ' K ' 

starting at a selection of initial values X(s = 0) and locating the minima associated with the 
final element to which this moves in s. For too few observations, there are many local minima, 
reflecting the complex structure of A Q (X) in state and parameter space associated with the 
instability of the manifold where the model output synchronizes with the observations [46j 
[47]. Adding measurements smoothes out the surfaces explored by this 'time' evolution when 
model errors or diminished model resolution is present. 
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Figure 4: Observed variable xo(t). Left: Conditional Mean (Green) and RMS-error (Red) 
along with Gaussian distributed noisy observations (Blue) and known Xo(t) (Black). Right: 
Skewness (Blue) and Kurtosis (Red) of Xo(t). These are Monte Carlo estimates from the 
path integral for the Lorenz96 model with D = 20; R m = 8,Rf = 100. The assimilation 
window is < t < 4, and the prediction window is 4 < t < 6. The parameter estimate 
is f — 8.24 ± 0.09. Predictions are made within the Monte Carlo evaluation of the path 
integral. 

The use of this tool is suggested by the fact that adding Gaussian noise to the Langevin 
equation leads to the density of paths exp[— v4 (X)] required for the path integral. The 
examination of the minima of A (X.) is essentially the four dimensional variational principle 
4DVAR [S] in the context of the path integral [35]. The path integral allows the evaluation 
of corrections due to fluctuations about this 'optimal' path. 

We then used a quite standard Monte Carlo Metropolis-Hastings [291 [30] method to select 
paths for evaluation of the moments desired. This is surely not the most computationally 
efficient Monte-Carlo approach, and, indeed, using properties of the Langevin equation to 
select paths may be much more efficient [211 EE] • 

We showed that in the example of the D = 20 dimensional Lorenz96 model, we were 
able to accurately estimate both the unobserved model state variables and the fixed forcing 
parameter with Monte Carlo methods applied to the path integral. We predicted the devel- 
opment of the model states after the data assimilation window closed by using the states 
and parameter at the end of this window as initial conditions in the deterministic model 
equations and as part of the continued use of the path integral itself. The former method 
gave quite accurate forecasts limited by the natural chaotic behavior of the model system 
which enhances any error in the estimates at the end of the assimilation window. The growth 
of this error in the deterministic model forecast is consistent with the known values of the 
largest Lyapunov exponent of the Lorenz96 model. 

When we used the path integral to make forecasts after the data assimilation window 
closed, we were successful again in those predictions, though less so than when using the 
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Figure 5: Observed variable xo(t). Left: Conditional Mean (Green) and RMS-error (Red) 
along with Gaussian distributed noisy observations (Blue) and known xo(t) (Black). Right: 
Skewness (Blue) and Kurtosis (Red) of xo(t). These are Monte Carlo estimates from the 
path integral for the Lorenz96 model with D = 20; R m = 8,Rf = 100. The assimilation 
window is < t < 4, and the prediction window is 4 < t < 6. In these calculations the 
conditional mutual information term in the action represents the measurement errors as 
Lorentzian, Equation fl20|) . The parameter estimate is / = 8.24 ±0.08. Predictions for t > 4 
are made with a fourth order Runge-Kutta procedure using information on the parameter 
and state variables at t — 4. 

deterministic model equations. As we introduced explicit loss of model resolution into the 
path integral formulation, this should not be a surprise. This use of the path integral for 
both assimilation of observed information and forecasting may prove valuable. 

As a final calculation we investigated the assumption of Gaussian measurement errors in 
the formulation of the path integral. We used our 'data' generated by adding Gaussian noise 
to the model output, and assumed that in the formulation of the path integral the condi- 
tional mutual information term was represented by a Lorentzian distribution of measurement 
errors. This had very little effect on the conditional means of the state variables and forcing 
parameter. The calculated deviations from Gaussian distributions of the estimated state 
variables, represented by nonzero skewness and kurtosis, were larger in this setting than 
when Gaussian measurement errors were assumed, but not significantly so. 

By examining the skewness and kurtosis of the estimates of the state variables we con- 
clude that even though the model dynamics is both non-Gaussian and chaotic, during the 
assimilation period the deviations from Gaussianity may remain small when the relative 
weight of the conditional mutual information, represented by our parameter R m remains 
large enough relative to the representative of model error, called Rf here. When Rf is in- 
creased relative to R m , the non-Gaussian contributions of model error to the action increase, 
and larger contributions to the estimated skewness and kurtosis result. This is especially 
clear from our results in the prediction windows. When we use the deterministic model equa- 
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tions to predict, we effectively set Rf — > oo, and it is clear that the skewness and kurtosis 
grow rapidly as the orbits move to the strange attractor of the system. This is also true 
when we use the path integral, as in Figure (4), to provide the moments in the prediction 
window. 

We conclude from these investigations that one can determine when assumptions about 
the Gaussianity of the conditional distribution of state variables might be a good approxi- 
mation, and in those situations the use of Kalman like filtering approaches can be produc- 
tive [JU H21 US]- Before assuming statistics one might do well to use our results to examine 
the situation for a given model, set of parameters, and collection of observations. As one 
improves the model resolution and the representation of physical processes, one can expect 
such linear approaches to become inadequate. 
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